Data processing apparatus address range dependent parallelization of instructions

ABSTRACT

A data processing apparatus has an instruction memory system arranged to output an instruction word addressed by an instruction address. An instruction execution unit, processes a plurality of instructions from the instruction word in parallel. A detection unit, detects in which of a plurality of ranges the instruction address lies. The detection unit is coupled to the instruction execution unit and/or the instruction memory system, to control a way in which the instruction execution unit parallelizes processing of the instructions from the instruction word, dependent on a detected range. In an embodiment the instruction execution unit and/or the instruction memory system adjusts a width of the instruction word that determines a number of instructions from the instruction word that is processed in parallel, dependent on the detected range.

The invention relates to a data processing apparatus, such as a VLIW(Very Long Instruction Word) processor, that is capable of executing aplurality of instructions from an instruction word in parallel.

A VLIW processor makes it possible to execute programs with a highdegree of instruction parallelism. Conventionally, in each instructioncycle the VLIW processor fetches an instruction word that contains afixed number, greater than one, of instructions (often calledoperations). The VLIW processor executes these operations in parallel inthe same instruction cycle (or cycles). For this purpose the VLIWprocessor contains a plurality of functional units, each capable ofexecuting one of the operations from the instruction word at a time.Different kinds of functional units are typically provided, such asALU's (arithmetic logic units), multipliers, branch control units,memory access units etc. Often dedicated purpose functional units arealso included, designed to speed up programs for a particularapplications. Thus, for example, functional units for performing partsof MPEG encoding or decoding may be added.

In large sections of programs, however, it is impossible to supplyoperations to all functional units in every instruction cycle. Thisoccurs for example when insufficient data is available to startoperations in all functional units. In this case, “no-operation”instructions have to be included in the instruction word for thefunctional units for which no instruction is available. When suchinstruction words have to be kept in instruction memory this leads toexcessive memory use.

Several measures have been proposed to reduce this excessive memory use.For example, instructions may be compressed, by encoding no-operationinstructions more efficiently than other instructions. However, thisstill involves memory overhead and it potentially slows down theprocessor. In another development, it has been known to use fields inthe instruction word for clusters of functional units, so that any onefunctional unit of a cluster can get an instruction from the field perinstruction cycle. Because the instruction word thus contains only onefield for a plurality of functional units this reduces the size of theinstruction word, but it reduces the maximum level of parallelism.

U.S. Pat. No. 5,774,737 describes that a single VLIW processor may useinstructions with different lengths. The instructions may contain alength code, to indicate their length. Alternatively, an instructionlength register may be used which indicates a current length. Thefunctional units execute the number of instructions indicated by thevalue of the length in the instruction length register. By setting thecurrent length the instruction length can be adapted to the level ofparallelism that is permitted in different parts of the program.However, setting the current length involves execution of additionalinstructions.

Amongst others, it is an object of the invention to improve the memoryefficiency of processors that are capable of executing a plurality ofinstructions from an instruction word in parallel.

Amongst others, it is a further object of the invention to facilitatethe use of dedicated purpose functional units without causing excessivememory use.

The processing apparatus according to the invention is set forth inclaim 1. According to the invention, detection of the range of addressesfrom which an instruction word is fetched is used to determine the wayin which the instruction execution unit parallelizes processing of theinstructions from the instruction word.

In one embodiment, for example, the length of the instruction word isdependent on the range to which its address belongs. Thus, theinstruction execution unit may treat information from the instructionmemory as relatively longer instruction words, containing relativelymore instructions, when these words come from a range of addresses thatrefer to instructions from the inner loop of a program and theinstruction execution may treat the information as relatively shorterinstruction words, containing relatively fewer instructions, when thesewords come from another range of addresses. Thus, high parallelism canbe realized in the inner loop and high storage efficiency can berealized outside the inner loop, without need for explicit instructionsto change the instruction word length when passing into or out of theinner loop.

In a further embodiment the instruction memory system is adapted toadjust the width of the instruction words that are fetched dependent onthe address range. Different types of memory, for example with differentspeeds, may be used for different ranges. Preferably the supply of clocksignals to a part of the instruction memory is disabled when theinstruction addresses are not in a range that maps to that part of theinstruction memory.

In another embodiment, the instruction execution unit contains aplurality of functional units for executing different instructions fromthe instruction word. In this embodiment different ones of thefunctional units are selected to execute instructions from theinstruction word, dependent on the address range from which theinstruction word is read. Thus, instructions from the instruction wordmay be treated as instructions for dedicated purpose functional units inone range of addresses and as instructions for other functional units inanother range of addresses. MPEG decoding and encoding, for example, istypically limited to specific parts of a program, and thereforefunctional units that are dedicated to the purpose of such decoding andencoding are only needed in those parts of the program. By selectingthese functional units on the basis of the address range, there is noneed for an increased width of the instruction word to select whichfunctional units should process the instruction.

In a further embodiment the functional units may use instructions withdifferent widths. Thus, instructions for an ALU functional unit mayinvolve designations of an operation, two operand registers and a resultregister, whereas instructions for a dedicated purpose functional unitmight involve designations of four operand registers and two resultregisters. Dependent on the address range the width of instructions inthe instruction word may be adapted.

These and other objects and advantageous aspects of the apparatus andmethod according to the invention will be described in more detail usingthe following figures.

FIG. 1 shows a data processing apparatus

FIG. 2 shows an embodiment of an instruction memory system

FIG. 2A shows part of a data processing apparatus

FIG. 2B shows part of a data processing apparatus

FIG. 3 shows an address range detector

FIG. 4 shows instruction words for the processing apparatus

FIG. 5 shows a flow chart programming the data processing apparatus

FIG. 6 shows an embodiment of an instruction memory system

FIG. 7 shows a data processor apparatus

FIG. 1 shows a data processing apparatus with an instruction addressingunit 10, an instruction memory system 12, an instruction execution unit14 and an address range detector 16. The instruction addressing unit 10has an address output coupled to instruction memory system 12.Instruction memory system 12 has an instruction output coupled toinstruction execution unit 14. Instruction execution unit has an outputcoupled to instruction addressing unit. Address range detector 16 has aninput coupled to the address output of instruction addressing unit 10and an output coupled to a control input 11 of instruction executionunit 14 and instruction memory system 12.

Instruction execution unit 14 contains an input section 140, a pluralityof functional units 142, a register file 144. Input section 140 iscoupled between instruction memory system 12 and functional units 142.Address range detector 16 is coupled to input section 140. Furthermore,input section 140 has selection outputs coupled to register file 144.Functional units 142 have inputs and outputs coupled to register file144. At least one of the functional units is a branch control unithaving an output coupled to instruction addressing unit 10.

In operation the apparatus operates in successive instruction cycles. Ineach instruction cycle instruction addressing unit 10 supplies aninstruction address to instruction memory system 12. In responseinstruction memory system 12 retrieves an instruction word addressed bythe instruction address and supplies the retrieved instruction word toinstruction execution unit 14. Input section 140 passes operationselection codes from the instruction word to functional units 142 andinput section 140 register selection codes from the instruction word tothe selection inputs of register file 144. In response to the registerselection codes, register file 144 retrieves operands from registers inregister file 144 and supplies this data to functional units 142. Inresponse to the operation selection codes functional units 142 performselected processing operations, using the operands as input data, andsupply the results of these operations to register file 144. Registerfile 144 stores these results in registers selected by registerselection codes from the instruction word. In general, operation will bepipelined, that is, the various actions in response to an instructionaddress (retrieving the instruction, retrieving operands, processing,storing the results) will be executed during different instructioncycles, at a time when other ones of the actions are performed forpreceding and/or subsequent instruction addresses.

The way instruction words are treated depends on the range of addressesin which the instruction address of the instruction word lies. In oneembodiment, the width of the instruction word depends on the range. Whenthe instruction address is in a first range a first number ofinstructions from the instruction word is executed by the functionalunits 142 and when the instruction address is in a second range a secondnumber of instructions from the instruction word is executed by thefunctional units 142. Accordingly, input section 140 receives adetection signal from address range detector 16, indicating the range inwhich the instruction address lies (if need be delayed by a number ofinstruction cycles, as appropriate for the pipe-line delay betweenaddressing and supply of the instruction), dependent on the range inputsection 140 retrieves a greater or smaller number of operation selectioncodes from the instruction word is supplied to the functional units witha signal to execute the instructions.

FIG. 2 shows an embodiment of instruction memory system 12 for use withinstructions of different length. Instruction memory system 12 containsa plurality of memory units 20, 22, a multiplexer 24, and clock gatingcircuits 28 a,b, and is coupled to a clock unit 26. An address input 23of instruction memory system 12 is coupled at least partly to addressthe memory units 20, 22. A first one of the memory units 20 has a wordsize that is larger than the word size of a second one of the memoryunits 22. (Symbolically a first memory unit 20 is shown wider thansecond memory unit 22, to indicate the first memory units' widerinstruction word size, whereas second memory unit 22 is shown higherthan first memory unit 20 to indicate that second memory containslocations for a greater number of instruction words). Instructionoutputs of memory units 20, 22 are coupled to inputs of multiplexer 24,the instruction output of the second one of the memory units 22 beingcoupled to the input of the multiplexer 24 in combination with defaultinput 29 (which supplies for example no-operation instructions). Anoutput of multiplexer 24 is coupled to instruction execution unit 14(not shown). Clock unit 26 is coupled to clock inputs of memory units20, 22, each via a respective one of the clock gating circuit 28 a,b. Anoutput of address range detector 16 is coupled to an input 11 ofinstruction memory system 12 that is coupled to a control input ofmultiplexer 24 and to disable inputs of clock gating circuits 28 a,b.

In operation, instruction memory system 12 outputs instruction words toinstruction execution unit 14 in response to instruction addresses. Whendetector 16 indicates that the instruction addresses are in a firstrange multiplexer 24 outputs instruction words from the first one of thememory units 22 to instruction execution unit 14. When detector 16indicates that the instruction addresses are in a second rangemultiplexer 24 outputs instruction words from the second one of thememory units 22 to instruction execution unit 14.

Typically, the first one of the memory units 22 contains instructionwords from an inner loop of a program, that is, a part of a program thatis repeatedly executed the highest number of times. Usually, theinstruction words of such inner loops are optimised so that a maximumuse can be made of parallel execution by instruction execution unit 14.Hence, each instruction words from the inner loop mostly containinstructions for a relatively large number of functional units. Thesecond one of the memory units 22 outputs instruction words from outsidethe inner loop, that are executed less frequently. These instructionwords contain instructions for relatively fewer functional units.Accordingly, the second one of the memory units 22 has a smallerinstruction word size, outputting fewer bits in response to aninstruction address than the first one of the memory units 20.Therefore, more efficient use of memory space is possible. The first oneof the memory units 20 stores wider instruction words, this increasesefficiency of execution in the inner loops. The first one of the memoryunits 20 may also be faster than the second one of the memory units 22,permitting shorter instruction cycles in the inner loop.

In principle each memory unit 20, 22 needs to respond only to addressesfrom a respective one of the address ranges. No memory space needs to bepresent for addresses in the range to which the other memory unitresponds. However, in practice the second one of the memory units 22 mayalso be responsive to addresses in the range of the first one of thememory units 20. When this range only involves the inner loop, thiswould cause little memory overhead and it would permit locating theaddress range of the first one of the memory units 20 anywhere in thememory space of the second one of the memory units 22, so that thesecond one of the memory units 22 readily provides addresses both infront of and in the rear of this address range.

When the instruction address is not in the range supported by one of thememory units 20, 22, the clock supply to this memory unit 20, 22 ispreferably disabled. Thus, power consumption is reduced. On the onehand, during execution of instruction words from the inner loop no clockneeds to be supplied to the second one of the memory units 22. On theother hand, during execution of instructions from outside the inner loopno clock needs to be supplied to the first one of the memory units 20.Providing one or both of the memory units 20, 22 with a circuit fordisabling its clock when the corresponding memory unit 20, 22 is notneeded will reduce power consumption.

FIG. 2 shows two memories, each having locations that are addressed bythe instruction address, the locations having different width, dependenton the memory. Although only two memories 20, 22 are shown, it will beunderstood that a greater number of such memories could be used, eachwith its own width and each for its own range of addresses. Thus, thewidth of the memory locations can be more closely adapted to the needsof different parts of the program.

Changing the number of instructions in the instruction words is only oneway in which use can be made of the detection of the address range. Inanother aspect of the invention input section 140 uses the detectedrange to select which of the functional units 142, or groups of thefunctional units should execute instructions from the instruction word.In a most basic VLIW processor, each instruction from an instructionword goes to a respective one of the functional units 142. This providesa high potential parallelism, but involves high memory usage. In moreadvanced VLIW processors, each instruction field from an instructionword can contain an instruction for a programmable one of a respectivegroup of functional units 142. In this case, a code in the instructionword conventionally determines which functional unit of the group shouldexecute the instruction. Still in this case, input section 140 wouldsignal to the selected functional unit 142 that it should execute theinstruction. In other more advanced VLIW processors, so-called superfunctional units are provided which are programmed by instructions thatcontain information from a plurality of fields in the instruction word,where each of these fields could or would normally be used for aseparate instruction. Thus, for example instructions with an abnormallylarge number of operands can be conveyed.

FIG. 2A shows part of a data processor with an instruction word memorysystem 204, with address input 206, functional units 200 (or moregenerally groups 200 of functional units) and address range detector208. Instruction memory system 204 has outputs for respectiveinstructions from an addressed instruction word. These outputs arecoupled to functional units (or respective groups of functional units)200 and to a register file (not shown). Address range detector 208receives the instruction word address and selects which of a number ofthe functional units (or respective groups of functional units) 200 willexecute an instruction from an instruction word, dependent on the rangeof instruction addressed that the instruction address was detected tobelong to. Although selection of functional units (or respective groupsof functional units) 200 has been illustrated for one of the instructionwords, selection may of course be applied to any number of instructionsfrom the instruction word. Thus, in a first embodiment address rangedetector 208 selects certain functional units (or respective groups offunctional units) 200 to execute an instruction rather than otherfunctional units if the instruction address is detected to be in acertain range, the other functional units being selected when theinstruction address is not in that range. Thus smaller instruction wordssuffice. Accordingly, instruction memory system 204 may disable certainmemory units, or the address step size between successive instructionaddresses may be reduced. Both reduce the amount of memory needed forinstruction words.

In another embodiment instruction memory system 204 providesinstructions to all functional units (or groups of functional units)from the instructions words when the instruction address is in a certainrange. In this embodiment, address range detector 208 selects allfunctional units for executing instructions in response to detectionthat the instruction address was in that range. When the instructionaddress is outside the range, address range detector 208 selects only asubset of the functional units.

FIG. 2B shows an embodiment that additionally contains multiplexers 202,controlled by address range detector 208, between instruction memorysystem 204 and certain of functional units (or respective sub-groups offunctional units) 200. One input of each of multiplexers 202 is coupledto a common instruction output of instruction memory system 204 forsharing by a group of functional units 200. For the sake of clarity onlyone output from instruction memory system 204 is shown to symbolizeconnections to functional units 200 and the register file. Multiplexers202 each have another input coupled to a respective differentinstruction output of instruction memory system 204.

When address range detector 208 detects that the instruction address isin a certain range address range detector 208 may increase the number ofgroups of functional units 200 from which functional units are selectedto execute an instruction from an instruction word, for example bysplitting the group into two or more subgroups. When the instructionaddress is in the certain range address range detector 208 selectsmultiplexers 202 to supply different respective ones of the instructionsfrom the instruction word to different functional units 200 or subgroupsof functional units of a group of functional units 200. When theinstruction address is outside the range address range detector 208supplies the same instruction from the instruction word to allfunctional units in the group. In this case, a smaller instruction wordis needed. Accordingly, instruction memory system 204 may disablecertain memory units, or the address step size between successiveinstruction addresses may be reduced. Both reduce the amount of memoryneeded for instruction words.

Of course, more complicated forms of regrouping may be used, (subgroupsof) functional units 200 being part of one group in one range and partof another group in another range, and/or forming a group by itself in afurther range. Furthermore, the embodiment of FIG. 2A may be combinedwith that of FIG. 2B, so that different functional units 200 may beselected to execute one instruction from an instruction word dependenton the range of the instruction address, the functional units 200 eachreceiving there own instruction word in parallel in another range.

Of course, the embodiments of FIGS. 2, 2A and 2B may be combined, memoryunits being provided only for address ranges where instructions fromthese memory units are needed.

Also input section may add functional units to or remove functionalunits from the groups dependent on the range.

FIG. 3 shows an embodiment of address range detector 16. Address rangedetector contains a lower bound source 30, an upper bound source 32, alower bound comparator 34, an upper bound comparator 36 and an AND gate38. The lower bound source 30 is coupled to a first input of lower boundcomparator 34 and the upper bound source 32 is coupled to a first inputof upper bound comparator 36. An input for the instruction address iscoupled to second inputs of upper bound comparator 36 and lower boundcomparator 34. The outputs of upper bound comparator 36 and lower boundcomparator 34 are coupled to inputs of AND gate 38, whose output is theoutput of address range detector 16.

In operation comparators 34, 36 compare the instruction address with theupper and lower bound from sources 30,32. When the instruction addressis between these bounds AND gate 38 outputs one signal value and, ifnot, it outputs another signal value. Sources 30,32 may be hardwiredwhen it is known in which address range programs contain instructions inthe inner loop. Alternatively, sources 30, 32 may contain on or moreregisters whose content determines an upper bound value and a lowerbound value (by setting both upper and lower bound values, or by settingfor example only a lower bound, the upper bound having a predeterminedoffset with respect to the lower bound). These registers may be loadedwith appropriate values when the program is loaded into memory units20,22. Alternatively, these values may be set as a result of executionof instructions.

Preferably, the address range for which the first one of the memoryunits 20 stores relatively wide instructions is also adjustable. Thismay be realized by using only a less significant part of the instructionaddress to address the first one of the memory units 20, the addressrange detection being used to activate the first one of the memory units20. Alternatively, the lower bound may be subtracted from theinstruction address and used as address for the first one of the memoryunits 20 in this case. When one or more registers are used to providethe bounds, the relevant address ranges can thus be set by loading theseregisters. Thus, the address ranges with wider instructions can be setdependent on the program involved. In preparation of execution of a partof a program that involves entering and exiting into and from an innerloop, the bound can be loaded.

Of course, address range detector 16 can easily be extended todistinguish between more than two ranges or to detect instructionaddresses in ranges with disjoint parts. In this case a differenttreatment may be given to instructions from each range and memory units20,22 may be provided for each range.

More generally a memory mapping unit (MMT) may used to select at runtime which physical memory is addressed by which logical instructionaddress. In this case, according to one aspect of the invention, thewidth of different physical memories differs. When a program is loadedinstruction words from an inner loop are stored in a wider memory. TheMMU is set to map the logical instruction addresses of these instructionwords to physical addresses in the wider physical instruction memory.During execution the MMU maps the logical instruction addressesaccordingly and the memory returns instruction words with a width thatdepends on the physical memory that is physically addressed by the MMU.

FIG. 4 shows examples of functional instructions words 40, 42 that maybe supplied to instruction execution unit 14. A first type ofinstruction word 40 is used when the instruction address is in one rangeand a second type of instruction word 42 is used when the instructionaddress is in another range. The first type of instruction word iswider, containing more instructions 44 a-f than the second type ofinstruction word 42 (with instructions 46 a,b and a code 47 forselecting which of the functional units 142 should execute theinstructions).

Different treatment of instruction words may be implemented by design ofinput section, with address range dependent routing of instructions fromthe received instruction words to the functional units. Alternatively,different treatment may be implemented using conventional processing ofcompressed instruction words. In this case, the apparatus adds ormodifies codes that are provided to guide decompression of theinstruction words, the codes being constructed dependent on the addressrange of the instruction. The codes are supplied in instructionexecution unit 14, causing instruction execution unit to treat theinstruction words dependent on the address range as a result of theadded or modified codes. For example, a code indicating that all groupsof functional units should process instructions could be added when theinstruction address is in a first range, whereas a code from instructionmemory, indicating a selection of a subset of the functional units couldbe supplied when the instruction address is in a second range, or thelatter code could be generated when the same functional units shouldalways be used when the instruction address is in the second range.

The functional units may include dedicated purpose functional units,such as units that perform functions to speed up MPEG decoding orencoding. Typically only specific parts of programs contain instructionwords with instructions for such special purpose functional units. Byaccepting instructions for these functional units only when theinstruction address is in a certain range, it is not necessary toprovide instruction space for such functional units for instructionswith addresses outside that range. In this case, the instruction word inboth ranges may even have the same width. Because it is known that noinstructions for a subset of the functional units are encoded in theinstruction words when the instruction address is outside a certainrange, more space is available for encoding instructions for otherfunctional units in instruction words that are stored outside thatcertain range.

FIG. 5 shows a flow chart for programming the processing apparatus ofFIG. 1. In a first step 51 of the flow chart a program is compiled andinstructions are generated for executing the program. In a second step52, the position of the inner loop (or loops) in the program aredetermined. This may be done by automatic code inspection, or byprofiling (that is, counting the number of times different instructionsare executed during trial execution for typical input data). In a thirdstep, 53 the instruction words are formed, the instruction words in theinner loops being optimised, for example by using known techniques suchas (partial) loop unrolling, or by providing instructions for specialpurpose functional units. In a fourth step 54 the instruction words areloaded into instruction memory system 12 so that the instruction wordsin the inner loop are stored at memory locations with instructionaddresses in the range where instruction memory system stores widerinstruction words, or where instruction execution unit selects toexecute more instructions from the instruction word in parallel.Alternatively, the bounds of the range are set according to thelocations where the instruction words of the inner loop have beenloaded.

FIG. 6 shows a further instruction memory system 60 for implementing anaspect of the invention. Instruction memory system 60 has a singleinstruction address input 64, a controller 66 and contains a pluralityof memory units 62 a-d at least part of instruction address input 64 iscoupled to address inputs of each of the memory units 62 a-d. At leastpart of the instruction address input is coupled to controller 66.Controller 66 is coupled to each of the memory units 62 a-d individuallyto make each memory units 62 a-d responsive to instruction addresses ina respective address range that is particular to the memory, the addressranges of different memory units 62 a-d may overlap and they may containmutually different numbers of instruction addresses. (This is symbolizedby the fact that the memory units 62 a-d span different vertical heightranges in the figure). Similarly, each memory unit 62 a-d may have itsown width, i.e. the width of the instruction data that is addressed withan instruction address may differ from one memory to another. (This issymbolized by the fact that the memory units 62 a-d span differenthorizontal widths in the figure).

In operation a processing unit (not shown) supplies successiveinstruction addresses to instruction memory system 60. Dependent on thevalue of the instruction address, controller 66 signals one or more ofthe memory units 62 a-d to respond. The selected memory units 62 a-deach retrieve part of an instruction from the memory units 62 a-d (orthe whole instruction word, if only one memory unit 62 a-d is selected).The parts of the instruction from different ones of the memory units 62a-d are supplied, in combination, as an instruction word to theprocessing unit (not shown). Preferably clock signals in one or more ofthe instruction units 62 a-d are disabled when they are not selected.

FIG. 7 shows a processor using memory system 60 of FIG. 6. The processorcontains a plurality of functional unit groups 70 a-g, a register file72 and a program counter 74. Each functional unit group may contain oneor more functional units (not shown). The selection outputs ofcontroller 66 in memory system 60 are coupled to clock enable inputs ofthe functional unit groups 70 a-g. The instruction outputs of the memoryunits (not shown for the sake of clarity) of instruction memory system60 are coupled to instruction selection inputs of the functional unitgroups 70 a-g, to operand register selection inputs and to resultregister selection inputs of register file 72. Functional unit groups 70a-g have operand inputs and result outputs coupled to register file 72(for the sake of clarity all these connections are shown by a singleline, although in practice independent connections are used).

In operation, each memory unit of instruction memory system 60 isdedicated to one or more groups of functional units 70 a-g. Clocksignals in one or more of the groups of functional units 70 a-g aredisabled when the selection signals from controller 66 indicate that thecorresponding memory unit of the group of functional units 70 a-g is notselected. Thus, the functional units in the group receive no clocksignals and power consumption is further reduced.

However, it will be understood that the instruction memory system ofFIG. 6 can be used independently of the embodiment of FIG. 7. That is,the clock signals in the functional unit groups need not be disabled.Neither is it necessary that there is a fixed relation between memoryunits and functional unit groups (although such a fixed relation speedsup processing and simplifies the circuit).

1. A data processing apparatus, the apparatus comprising: an instructionaddress generation circuit for outputting an instruction address; aninstruction memory system arranged to output an instruction wordaddressed by the instruction address, including at least one type ofmemory selected to achieve a desired instruction cycle time whereinlonger instruction words are stored in said memory system within memoryranges of progressively shorter instruction words associated with acorresponding memory type; an instruction execution unit, arranged toprocess a plurality of instructions from the instruction word inparallel; a detection unit, arranged to detect in which of a pluralityof ranges the instruction address lies, the detection unit being coupledto the instruction execution unit parallelizes processing of theinstructions from the instruction word, dependent on a detected range.2. A data processing apparatus according to claim 1, wherein theinstruction execution unit and/or the instruction memory system isarranged to adjust a width of the instruction word that determines anumber of instructions from the instruction word that is processed inparallel, dependent on the detected range.
 3. A data processingapparatus according to claim 1, wherein the instruction execution unitcomprises a plurality of functional units, the instruction executionunit being arranged to select a subset of the functional units that isavailable for processing the instruction, dependent on the detectedrange.
 4. A data processing apparatus according to claim 1, wherein theinstruction execution unit comprises a plurality of functional units,the instruction execution unit being arranged to select whetherfunctional units or groups of functional units from a set of functionalunits each receive respective instructions from the instruction word, orreceive a shared instruction from the instruction word, dependent on thedetected range.
 5. A data processing apparatus according to claim 2,wherein the instruction memory comprises a first memory unit and asecond memory unit, providing storage with a first and second unit ofwidth of addressable memory locations for instructions words ofdifferent lengths with addresses in a first and second rangerespectively, the first and second unit of width being mutuallydifferent.
 6. A data processing apparatus of claim 5, programmed toexecute a program, longer instruction words from a inner loop of theprogram being stored in the first memory unit, shorter instruction wordsfrom a majority of the program outside the inner loop being stored inthe second memory unit, the first unit of width being larger that thesecond unit of width.
 7. A data processing apparatus according to claim5, comprising a memory mapping unit arranged to map the instructionaddress onto the first memory unit or the second memory unit, dependenton the detected range.
 8. A data processing apparatus according to claim5, wherein the instruction memory system is arranged to disable supplyof clock signals to the first memory unit when addresses in the secondrange are detected.
 9. A data processing apparatus according to claim 5,wherein the instruction memory system is arranged to disable supply ofclock signals to all but the memory unit from whose address rangeaddresses are detected.
 10. A data processing apparatus according toclaim 2, wherein the instruction memory system comprises a plurality ofmemory units, each arranged to be responsive to instruction addresses ina respective range, the instruction memory allowing partial overlap ofthe respective range, the instruction memory system being arranged tosupply the instruction word as a combination of instructions from thoseof the memory units in whose respective range the instruction addresslies.
 11. A data processing apparatus according to claim 10, wherein theinstruction memory system is arranged to disable supply of clock signalsto at least one of the memory units when the instruction address is notin the respective range of said at least one of the memory units.
 12. Adata processing apparatus according to claim 10, wherein the executionunit comprises groups of one or more functional units, each group beingcoupled to a respective predetermined one of the memory units, forreceiving instructions from the instruction words, when the instructionaddress is in the respective range of the respective predetermined oneof the memory unit to which the group is coupled.
 13. A method ofprogramming a data processing apparatus comprising: generating a programof machine instructions for the apparatus; identifying an inner loop ofthe program; loading the program into the instruction memory system,said memory system includes at least one type of memory selected toachieve a desired instruction cycle time, so that instructions from theinner loop are loaded at memory locations with instruction addresses ina range of addresses for which the apparatus provides a higher degree ofparallelism than another range of addresses, wherein longer instructionwords are stored in said memory system within memory ranges ofprogressively shorter instruction words and are associated with acorresponding memory type.
 14. A method of executing a program with adata processing apparatus, the method comprising: using an instructionaddress to fetch an instruction word; executing instructions from thefetched instruction word; detecting in which of a plurality of rangesthe instruction address list lies, controlling a way in whichinstruction execution is parallelized dependent on a detected range,wherein longer instruction words are contained within ranges of shorterinstruction words and instructions words are stored in a type of memoryselected to achieve a desired instruction cycle time.
 15. A methodaccording to claim 14, the method comprising adapting a width of thefetched instruction word dependent on the detected range.
 16. A methodaccording to claim 14, the method comprising changing a selection offunctional units of the apparatus that is used to execute theinstructions dependent on the detected range.